摘要 :
The success of end-to-end speech-to-text translation (ST) is often achieved by utilizing source transcripts, e.g., by pre-training with automatic speech recognition (ASR) and machine translation (MT) tasks, or by introducing addit...
展开
The success of end-to-end speech-to-text translation (ST) is often achieved by utilizing source transcripts, e.g., by pre-training with automatic speech recognition (ASR) and machine translation (MT) tasks, or by introducing additional ASR and MT data. Unfortunately, transcripts are only sometimes available since numerous unwritten languages exist worldwide. In this paper, we aim to utilize large amounts of target-side monolingual data to enhance ST without transcripts. Motivated by the remarkable success of back translation in MT, we develop a back translation algorithm for ST (BT4ST) to synthesize pseudo ST data from monolingual target data. To ease the challenges posed by short-to-long generation and one-to-many mapping, we introduce self-supervised discrete units and achieve back translation by cascading a target-to-unit model and a unit-to-speech model. With our synthetic ST data, we achieve an average boost of 2.3 BLEU on MuST-C En→De, En→Fr, and En→Es datasets. More experiments show that our method is especially effective in low-resource scenarios.
收起
摘要 :
How to achieve better end-to-end speech translation (ST) by leveraging (text) machine translation (MT) data? Among various existing techniques, multi-task learning is one of the effective ways to share knowledge between ST and MT ...
展开
How to achieve better end-to-end speech translation (ST) by leveraging (text) machine translation (MT) data? Among various existing techniques, multi-task learning is one of the effective ways to share knowledge between ST and MT in which additional MT data can help to learn source-to-target mapping. However, due to the differences between speech and text, there is always a gap between ST and MT. In this paper, we first aim to understand this modality gap from the target-side representation differences, and link the modality gap to another well-known problem in neural machine translation: exposure bias. We find that the modality gap is relatively small during training except for some difficult cases, but keeps increasing during inference due to the cascading effect. To address these problems, we propose the Cross-modal Regularization with Scheduled Sampling (CRESS) method. Specifically, we regularize the output predictions of ST and MT, whose target-side contexts are derived by sampling between ground truth words and self-generated words with a varying probability. Furthermore, we introduce token-level adaptive training which assigns different training weights to target tokens to handle difficult cases with large modality gaps. Experiments and analysis show that our approach effectively bridges the modality gap, and achieves promising results in all eight directions of the MuST-C dataset.
收起
摘要 :
Simultaneous machine translation (SiMT) outputs translation while reading source sentence and hence requires a policy to decide whether to wait for the next source word (READ) or generate a target word (WRITE),the actions of which...
展开
Simultaneous machine translation (SiMT) outputs translation while reading source sentence and hence requires a policy to decide whether to wait for the next source word (READ) or generate a target word (WRITE),the actions of which form a read/write path. Although the read/write path is essential to SiMT performance, no direct supervision is given to the path in the existing methods. In this paper, we propose a method of dual-path SiMT which introduces duality constraints to direct the read/write path. According to duality constraints, the read/write path in source-to-target and target-to-source SiMT models can be mapped to each other. As a result, the two SiMT models can be optimized jointly by forcing their read/write paths to satisfy the mapping. Experiments on En↔Vi and De↔En tasks show that our method can outperform strong baselines under all latency.
收起
摘要 :
Polarization in American politics has been extensively documented and analyzed for decades, and the phenomenon became all the more apparent during the 2016 presidential election, where Trump and Clinton depicted two radically diff...
展开
Polarization in American politics has been extensively documented and analyzed for decades, and the phenomenon became all the more apparent during the 2016 presidential election, where Trump and Clinton depicted two radically different pictures of America. Inspired by this gaping polarization and the extensive utilization of Twitter during the 2016 presidential campaign, in this paper we take the first step in measuring polarization in social media and we attempt to predict individuals' Twitter following behavior through analyzing ones' everyday tweets, profile images and posted pictures. As such, we treat polarization as a classification problem and study to what extent Trump followers and Clinton followers on Twitter can be distinguished, which in turn serves as a metric of polarization in general. We apply LSTM to processing tweet features and we extract visual features using the VGG neural network. Integrating these two sets of features boosts the overall performance. We are able to achieve an accuracy of 69%, suggesting that the high degree of polarization recorded in the literature has started to manifest itself in social media as well.
收起
摘要 :
Polarization in American politics has been extensively documented and analyzed for decades, and the phenomenon became all the more apparent during the 2016 presidential election, where Trump and Clinton depicted two radically diff...
展开
Polarization in American politics has been extensively documented and analyzed for decades, and the phenomenon became all the more apparent during the 2016 presidential election, where Trump and Clinton depicted two radically different pictures of America. Inspired by this gaping polarization and the extensive utilization of Twitter during the 2016 presidential campaign, in this paper we take the first step in measuring polarization in social media and we attempt to predict individuals' Twitter following behavior through analyzing ones' everyday tweets, profile images and posted pictures. As such, we treat polarization as a classification problem and study to what extent Trump followers and Clinton followers on Twitter can be distinguished, which in turn serves as a metric of polarization in general. We apply LSTM to processing tweet features and we extract visual features using the VGG neural network. Integrating these two sets of features boosts the overall performance. We are able to achieve an accuracy of 69%, suggesting that the high degree of polarization recorded in the literature has started to manifest itself in social media as well.
收起
摘要 :
The shipping market is always in volatility according to complex impact factors. In order to better evaluate the
conditions of the shipping market and to make proper market early warning (EW), the authors consider the
volume, va...
展开
The shipping market is always in volatility according to complex impact factors. In order to better evaluate the
conditions of the shipping market and to make proper market early warning (EW), the authors consider the
volume, value level and other factors of shipping industry, simulates the structure Analytic Hierarchy Process
(AHP) and establishes the index system of shipping prosperity, applies the time difference correlation analysis
and cluster analysis. Then, these indicators are classified into three categories of the leading, coincident and
lagging indicators. In terms of theory of the economic cycle, diffusion indices and composite indices of global
shipping prosperity are presented by using the method of factor analysis, which predicts composite indices and
market situation with the prosperity index curve and signals, and collects the statistics of Shanghai and global
maritime economy in recent three years as a case study. The very case illustrates Shanghai dry bulk shipping
prosperity indices and the results, and comprehensively assesses the dynamic process of the global industry
prosperity. The early warning methods could be introduced to solve shipping market monitoring problems. This
study aims at developing an early warning system for shipping market and evaluating its performance. This
system can provide early indications of market crisis by using the prosperity index and evaluate its performance.
收起
摘要 :
The shipping market is always in volatility according to complex impact factors. In order to better evaluate the
conditions of the shipping market and to make proper market early warning (EW), the authors consider the
volume, ...
展开
The shipping market is always in volatility according to complex impact factors. In order to better evaluate the
conditions of the shipping market and to make proper market early warning (EW), the authors consider the
volume, value level and other factors of shipping industry, simulates the structure Analytic Hierarchy Process
(AHP) and establishes the index system of shipping prosperity, applies the time difference correlation analysis
and cluster analysis. Then, these indicators are classified into three categories of the leading, coincident and
lagging indicators. In terms of theory of the economic cycle, diffusion indices and composite indices of global
shipping prosperity are presented by using the method of factor analysis, which predicts composite indices and
market situation with the prosperity index curve and signals, and collects the statistics of Shanghai and global
maritime economy in recent three years as a case study. The very case illustrates Shanghai dry bulk shipping
prosperity indices and the results, and comprehensively assesses the dynamic process of the global industry
prosperity. The early warning methods could be introduced to solve shipping market monitoring problems. This
study aims at developing an early warning system for shipping market and evaluating its performance. This
system can provide early indications of market crisis by using the prosperity index and evaluate its performance.
收起
摘要 :
Simultaneous machine translation (SiMT) starts to output translation while reading the source sentence and needs a precise policy to decide when to output the generated translation. Therefore, the policy determines the number of s...
展开
Simultaneous machine translation (SiMT) starts to output translation while reading the source sentence and needs a precise policy to decide when to output the generated translation. Therefore, the policy determines the number of source tokens read during the translation of each target token. However, it is difficult to learn a precise translation policy to achieve good latency-quality trade-offs, because there is no golden policy corresponding to parallel sentences as explicit supervision. In this paper, we present a new method for constructing the optimal policy online via binary search. By employing explicit supervision, our approach enables the SiMT model to learn the optimal policy, which can guide the model in completing the translation during inference. Experiments on four translation tasks show that our method can exceed strong baselines across all latency scenarios.
收起
摘要 :
Many methods have been developed to help people find the video content they want efficiently. However, there are still some unsolved problems in this area. For example, given a query video and a reference video, how to accurately ...
展开
Many methods have been developed to help people find the video content they want efficiently. However, there are still some unsolved problems in this area. For example, given a query video and a reference video, how to accurately localize a segment in the reference video such that the segment semantically corresponds to the query video? We define a distinctively new task, namely video re-localization, to address this need. Video re-localization is an important enabling technology with many applications, such as fast seeking in videos, video copy detection, as well as video surveillance. Meanwhile, it is also a challenging research task because the visual appearance of a semantic concept in videos can have large variations. The first hurdle to clear for the video re-localization task is the lack of existing datasets. It is labor expensive to collect pairs of videos with semantic coherence or correspondence, and label the corresponding segments. We first exploit and reorganize the videos in Activ-ityNet to form a new dataset for video re-localization research, which consists of about 10,000 videos of diverse visual appearances associated with the localized boundary information. Subsequently, we propose an innovative cross gated bilinear matching model such that every time-step in the reference video is matched against the attentively weighted query video. Consequently, the prediction of the starting and ending time is formulated as a classification problem based on the matching results. Extensive experimental results show that the proposed method outperforms the baseline methods.
收起
摘要 :
Many methods have been developed to help people find the video content they want efficiently. However, there are still some unsolved problems in this area. For example, given a query video and a reference video, how to accurately ...
展开
Many methods have been developed to help people find the video content they want efficiently. However, there are still some unsolved problems in this area. For example, given a query video and a reference video, how to accurately localize a segment in the reference video such that the segment semantically corresponds to the query video? We define a distinctively new task, namely video re-localization, to address this need. Video re-localization is an important enabling technology with many applications, such as fast seeking in videos, video copy detection, as well as video surveillance. Meanwhile, it is also a challenging research task because the visual appearance of a semantic concept in videos can have large variations. The first hurdle to clear for the video re-localization task is the lack of existing datasets. It is labor expensive to collect pairs of videos with semantic coherence or correspondence, and label the corresponding segments. We first exploit and reorganize the videos in ActivityNet to form a new dataset for video re-localization research, which consists of about 10,000 videos of diverse visual appearances associated with the localized boundary information. Subsequently, we propose an innovative cross gated bilinear matching model such that every time-step in the reference video is matched against the attentively weighted query video. Consequently, the prediction of the starting and ending time is formulated as a classification problem based on the matching results. Extensive experimental results show that the proposed method outperforms the baseline methods. Our code is available at: https://github. com/fengyang0317/video_reloc.
收起